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CLAIMS: 

1 . A method of identifying one or more proteins in an unannotated DN A sequence, 
the method comprising: 

(a) dividing the DNA sequence into a plurality of sequence fragments each 
5 fragment being of substantially the same length and from about 300 to 5000 base pairs 

long; 

(b) performing a six frame translation of each of the DNA sequence fragments 
to obtain six translated amino acid sequence fragments for each DNA sequence 
fragment; 

10 (c) subjecting each of the translated sequence fragments to theoretical 

digestion to obtain a plurality of cleaved peptide sequences; 

(d) comparing experimental empirical data for peptide fragments from a 

protein digested in the same manner as the theoretical digestion at step (c) with the 

theoretical data generated in step (c) for each of the translated sequence fragments to 
15 identify one or more translated sequence fragments which include a significant number 

of peptides present in the digested protein, 

2. The method of claim 1 wherein the step (a) of dividing the DNA sequence into a 
plurality of sequence fragments is performed before the step (b)of performing the six 
frame translation. 

20 3. The method of claim 1 wherein the step (a) of dividing the DNA sequence into a 
plurality of sequence fragments is performed after the step (b) of performing the six 
frame translation. 

4. The method of any preceding claim wherein theoretically generated peptide 
masses are compared to the masses of the peptides experimentally generated by the 

25 digested protein and the sequence fragment which has the greatest number of 
theoretical peptide masses correlating to the empirical data indicates the likely location 
of the protein of interest in the DNA sequence. 

5. The method of any preceding claim wherein the masses of the peptides 
experimentally generated from the digested protein are determined by mass 

30 spectrometry. 

6. The method of any preceding claim wherein the DNA sequence is duplicated 
and the original and duplicate are split in such a manner that the sequence fragments 
from the original overlap divisions in the original genome sequence. 

7. The method of any preceding claim wherein the sequence fragments are from 
35 800 to 1200 base pairs long. 
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8. The method of claim 7 wherein the sequence fragments are around 1000 to 1050 
bases long. 

9. The method of any preceding claim wherein steps (c) and (a) are performed 
twice using different enzymes and the data is from the two digests is combined and 

5 analysed to identify the protein coding region of interest. 

10. The method of any preceding claim wherein the in theoretical digest all 
theoretical peptides which contain a stop codon are discarded. 

1 1 . The method of any preceding claim wherein the fragments are numbered so that 
an overlapping fragment is numbered n where the fragments it overlaps are numbered 

10 n-1 and n plus l t where n is an integer. 

12. A method of identifying one or more proteins in unannotated DNA sequence, 
the method comprising: 

(a) performing a six frame translation of a DNA sequence to provide six 
translated amino acid sequences; 
15 (b) dividing the six translated amino acid sequences into a plurality of 

fragments, each fragment comprising 100-1666 amino acids; 

(c) subjecting each of the fragments to theoretical digestion to obtain a 
plurality of cleaved peptide sequences; 

(d) comparing experimental empirical data for peptide fragment for peptide 
20 fragments from a protein digested in the same manner as the theoretical digestion at 

step (c) with theoretical data generated in step (c) for each of the fragments to identify 
one or more fragments which include a significant number of peptides present in the 
empirically digested protein, 

13. The method of claim 12 wherein each six translated amino acid sequences is 
25 duplicated and the original and duplicate of each are split in such a manner that the 

sequence fragments from the original overlap divisions in the original sequence. 



